Branch Write Back SRU FPU 3 FPU 2 FPU 1 FX
نویسنده
چکیده
This paper presents the performance of DSP, image and 3D applications on recent general-purpose microprocessors using streaming SIMD ISA extensions (integer and oating point). The 9 benchmarks benchmark we use for this evaluation have been optimized for DLP and caches use with SIMD extensions and data prefetch. The result of these cumulated optimizations is a speedup that ranges from 1.9 to 7.1. All the benchmarks were originaly computation bound and 7 becomes memory bandwidth bound with the addition of SIMD and data prefetch. Quadrupling the memory bandwidth has no eeect on original kernels but improves the performance of SIMD kernels by 15-55%.
منابع مشابه
Recycle of Immobilized Endocellulases in Different Conditions for Cellulose Hydrolysis
The immobilization of cellulases could be an economical alternative for cost reduction of enzyme application. The derivatives obtained in the immobilization derivatives were evaluated in recycles of paper filter hydrolysis. The immobilization process showed that the enzyme recycles were influenced by the shape (drop or sheet) and type of the mixture. The enzyme was recycled 28 times for sheets ...
متن کاملMicrosoft Word - FPU.docx
Floating-point arithmetic units (FPU) have paramount importance in applications that involve intensive mathematic operations. However, previous implementations of FPU either require much manual work or only support special functions (e.g. reciprocal, square root, logarithm, etc.). In this paper, we present an automatic method to synthesize general FPU by aligned partition. Based on the novel pa...
متن کاملMethod for Ultra-precision FPU Integration based on Fine-Grained Control
In general, the FPU and processor are decoupled in the method for FPU integration, in which the communication between them requires software intervention and ultra-precision FPU is unsupported. To avoid this problem, a method based on fine-grained control for integration of FPU into the RISC processor is proposed in this paper. In terms of operand width of floating-point instructions, the metho...
متن کاملIBM PowerPC 440 FPU with complex-arithmetic extensions
The PowerPCt 440 floating-point unit (FPU) with complexarithmetic extensions is an embedded application-specific integrated circuit (ASIC) core designed to be used with the IBM PowerPC 440 processor core on the Blue Genet/L compute chip. The FPU core implements the floating-point instruction set from the PowerPC Architecturee and the floating-point instruction extensions created to aid in matri...
متن کاملResonant normal form for even periodic FPU chains
We investigate periodic FPU chains with an even number of particles. We show that near the equilibrium point, any such chain admits a resonant Birkhoff normal form of order four which is completely integrable—an important fact which helps explain the numerical experiments of Fermi, Pasta, and Ulam. We analyze the moment map of the integrable approximation of an even FPU chain. Unlike the case o...
متن کامل